I've run into a small problem where my for
loop has been running for 3.5 hrs+. I have a Geodataset in SQL Server
with a field called Shape
that stores co-ordinates in geometry
data type. Firstly, I connect R to my DB via ODBC
and retrieve the information I want (also converting the Shape
column to something readable)
library(sp)
library(rgeos)
library(RODBC)
ch<-odbcConnect("SpatialAnalysis", rows_at_time=1)
df<-sqlQuery(ch, "select OBJECTID, LOT_NO, Shape.STAsText() as WKT FROM SRC_PLI_QLD")
cnt<-sqlQuery(ch, "select count(OBJECTID) from SRC_PLI_QLD")
This has 2.5 million points. I now read them into a SpatialPointsDataFrame
, reading the first element first.
point.sp <- SpatialPointsDataFrame(readWKT(df$WKT[1]),
data=data.frame(OBJECTID=df$OBJECTID[1], LOT_NO=df$LOT_NO[1]))
Now I read the rest of the elements. AND this is where the problem is. It has been 3.5 hrs and still running.
for (n in 2:as.integer(cnt)) {
point.sp <- rbind(point.sp,
SpatialPointsDataFrame(readWKT(df$WKT[n]),
data.frame(OBJECTID=df$OBJECTID[n], LOT_NO=df$LOT_NO[n])))
}
What is the problem in the above mentioned for
loop? Is there another way I can do this?
2 Answers 2
You could do
library(rgeos)
library(sp)
xy <- do.call(rbind, lapply(df$WKT, function(x) coordinates(readWKT(x))))
Note: it turns out you can vectorize readWKT by pasting all the wkt text strings together, the underlying code splits them. Not really sure why this is the case, it's inconvenient.
But it's kind of pointless given that every object is just a single coordinate pair. You could use a similar approach for polygons and lines, but it's slightly different. SpatialPoints
don't really have the same structure as the other classes, though SpatialMultipoints
do . . .
Then you need to reconstruct the Spatial stuff:
point.sp <- SpatialPointsDataFrame(SpatialPoints(xy), data = df, match.ID = TRUE)
Just set match.ID
to FALSE if it gives you grief. SpatialPoints
has a proj4string
argument where you can pass in the CRS(proj.4)
.
For points (but not multipoints) I would just use the db query to get x/y, then use coordinates(df) <- c("x", "y")
.
-
Thanks. My main goal is to do this with a Polygon dataset and then do an intersect between the Points and Polygon dataset to retrieve more attributes. I am doing all this correctly?CuriousBeing– CuriousBeing2016年02月27日 09:50:40 +00:00Commented Feb 27, 2016 at 9:50
-
BTW the rbind, do.call method takes equally the same time, as lapply is based on a for-loop.CuriousBeing– CuriousBeing2016年02月27日 10:04:32 +00:00Commented Feb 27, 2016 at 10:04
-
Looks like you are right, did not expect that. Use WKB and wkb::readWKB instead. Text is inefficient.mdsumner– mdsumner2016年02月27日 10:15:57 +00:00Commented Feb 27, 2016 at 10:15
-
1Another option is to use readOGR() with the MSSQLSpatial driver.mdsumner– mdsumner2016年02月27日 10:17:16 +00:00Commented Feb 27, 2016 at 10:17
-
Thanks heaps for your help. I solved it another way, answer belowCuriousBeing– CuriousBeing2016年02月27日 10:46:10 +00:00Commented Feb 27, 2016 at 10:46
I solved this in another way and it took less than 10 seconds. The change was in the query. Instead of retrieving Shape.STAsText()
as a WKT object from Shape
column, I retrieved the Lat
and Long
value from Shape
df<-sqlQuery(ch, "select OBJECTID, LOT_NO, Shape.STY as Lat, Shape.STX as Lon FROM SRC_PLI_QLD")
cnt<-sqlQuery(ch, "select count(OBJECTID) from SRC_PLI_QLD")
Then I did :
coordinates(df) =~Lat+Lon
and my dataframe df
has been converted to a SpatialPointsDataFrame
> class(df)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
Shape
column that has values like this :x110F0000010407000000DCD781517FD76F4140FAED5B2EFF46C190C2F53C9FD76F41A0703D5242FF46C1C8BAB873A1D76F416888632504FF46C1B0BFECF4A3D76F41801D38F7C5FE46C19031770984D76F41A080264AB5FE46C114D0448881D76F4100B37B6AF2FE46C1DCD781517FD76F4140FAED5B2EFF46C101000000020000000001000000FFFFFFFF0000000003
. These are my co-ordinates. I am extracting them out usingShape.AsText()
. Is there a better way?